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Abstract 

In many environmental applications, time series are either incomplete or irreg- 
ularly spaced. We investigate the application of the Spartan random process to 
missing data prediction. We employ a novel modified method of moments (MMoM) 
for parameter inference. The CPU time of MMoM is shown to be much faster than 
that of maximum likelihood estimation and almost independent of the data size. We 
formulate an explicit Spartan interpolator for estimating missing data. The model 
validation is performed on both synthetic data and real time series of atmospheric 
aerosol concentrations. The prediction performance is shown to be comparable with 
that attained by the best linear unbiased (Kolmogorov- Wiener) predictor at reduced 
computational cost. 

Key words: inference, precision matrix, gappy data, atmospheric aerosol, fine 
particulate, PM2.5 



1 Introduction 



Time series have wide-ranging applications in environmental monitoring, such 
as air and water quality control. The series carry information about temporal 
autocorrelations (henceforth, correlations) in variables such as atmospheric 
pollutant concentrations, particulate matter, indicators of water clarity, salin- 
ity etc. Knowledge of the correlation structure enables the prediction of time- 
series, estimating the prediction uncertainty, and developing stochastic sim- 
ulations that reconstruct (at least partially) the process of interest. In many 
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applications, observations are irregularly spaced (due to difficulties and cost of 
data acquisition), and they include fully missing (due to measurement inter- 
ruptions) or censored (lying below the equipment detection threshold) data. 
These gaps in the series prevent the use of standard analytical techniques that 
assume regular sampling. Recently, the European Union introduced a novel 
concept in environmental legislatio n, in an effort to establish common data 
quality objectives for air pollution (IBallestal . 120051 ). Hence, there is a need to 
harmonize data obtained by means of different measurement techniques and 
sampling conditions. Difficulties in quantifying uncertainty, due to incomplete 
time coverage, led to simplistic approaches that assume a random distribution 
of the missing data. The associated uncertainty is then reduced to a simple 
function of the percentage of the missing data. However, neglecting the fre- 
quency distribution and temporal correlations of the missing data can strongly 
affect the estimates. Furthermore, the data may be sampled at different tem- 
poral scales. By predicting time series at unmeasured instants, it is possible 
to fill in the missing values or to down-scale (refine) existing measurements. 



Several recent papers ev aluate methods o f time series prediction for environ- 
mental applications. In ( IHousemanl . 120051 ). a first-order autoregressive model 
was applied to irregularly spaced water clarity data. Multivariate models have 
been applied to the time series of pollut ant concentra t ions i n the Arctic, us- 



ing the multiple imputation approach (IHopke et all 120011 ). Time series of 



aerosol particle number concentrations were model ed on the basis of tr affic 



related air pollution a nd meteorological variables (IPaatero et al.l . 120051 ) . In 



( Uunninen et all 12004 ). the authors evaluate and compare various univari- 



ate and multivariate methods for missing data imputation in air quality data 
sets. Generally, the simple univariate models that either utilize only local in- 
formation or make linearity assumptions, are fast but their scope is limited. 
Multivariate models provide higher accuracy and reliability at increased com- 
putational cost. In the current study, we present a novel linear predictor, which 
is based on the use of "pseudo-energy" functionals inspired from statistical 
physics. We apply this Spartan predictor in conjunction with computationally 
efficient parameter inference method. We show that the results obtained with 
the Spartan predictor are comparable with those obtained by the best linear 
unbiased estimator. However, the former is superior in terms of computational 
speed. 



For Gaussian time series, the temporal structure is determined from the auto- 
covariance (henceforth, covariance) matrix, which is estimated from the data. 
In the case of non-uniform sampling steps, the structure function (variogram) 
is typically estimated instead of the covariance for practical reasons. However, 
temporal correlations are also present (albeit on different physical scales) in 
models of statistical physics, e.g., in the Ising model and spin glass models. 
In these models, correlations are imposed by means of physical interactions 
embodied in the energy functional and thus do not need to be calculated from 



the data. Recently , the method of Spartan Spat ial Random Fields (SSRF) 



( jHristopuloa . 120031 ; iHristopulos and Elognd . 120071 ) was proposed as a general 



framework for geostatistical applications. SSRFs are parametrically flexible, 
do not rely on variogram estimation to determine the spatial structure, and 
allow incorporating physical constraints in the joint probability density func- 
tion. In the present study, we define in the same spirit the Spartan Random 
Processes (SRP) and apply them to time series prediction. 

The rest of the paper is organized as follows. Spartan random processes are 
introduced in Section [2J In Section [3J we discuss parameter inference using 
maximum likelihood estimation and the modified method of moments. In Sec- 
tion H] we present the Spartan interpolator and compare it to the Kolmogorov 
Wiener predictor (simple kriging). In Sections [5] and [61 we compare the model 
inference and data prediction methods using both synthetic data as well as 
real time series of aerosol concentration. Finally, we summarize and present 
our conclusions in Section [7J 



2 Spartan Random Processes 



Herein we assume a Gaussian, second-order stationary (lYagloml . 119871 ). de- 
trended time series, X\(t); A is an intrinsic time scale related to temporal res- 
olution. The pdf can be expressed in terms of an energy functional H[X\(t)], 
according to the familiar from statistical physics expression of Gibbs pdf 's: 

MX X ] = ST 1 e-^W (i) 

where the partition function Z is the normalization factor. In the classical 
geostatistical framework the energy functional corresponds to: 

i N N 

H [ X A = .EE^.) [G^X x ( tj ), (2) 
A i=\j=\ 

where [G x ]i"j is the inverse of the covariance matrix (also known as the preci- 
sion matrix), and N is the number of the data points. 



In analogy with Spartan spatial random fields (IHristopulosl . 120031 ) . we de- 
fine the fluctuation-gradient-curvature (FGC) Spartan random process that 
involves four parameters with a well defined physical meaning: the scale co- 
efficient r] , the shape coefficient rji, the characteristic time £, and the cutoff 
circular frequency k c oc A -1 . A kernel function is used to implement the cut- 
off. Below we use a boxcar kernel with sharp spectral cutoff at k c . If time is 
considered as a continuous variable, the FGC Spartan pdf is determined from 



the following energy functional: 

I POO C n • -1 

H igc [X x] 0] = — j ^ dt {[X x (t)} 2 + Vl e[Xx(t)} 2 + eiXx(t)} 2 } , (3) 



where = (t]q, rji, £, k c ) and the dots denote the first and second order time 
derivatives. The covariance spectral density is then given by the following 
expression 

G (k- 6) = hWnt (4) 

where hx(k) — 1, if k < k c and h\(k) = otherwise. The covariance function 
is then obtained from the inverse Fourier transform, given by the following 
integral: 



/oo dh ~ 
^G x (k;0)e* T 
-oo Z7T 



(5) 



For a discrete time series, sampled at the times t n = na, n = 1, . . . , N, a > 0, 
the derivatives are approximated by finite forward differences and the energy 
functional takes the following form: 



1 

H {gc [X X ; 0] = — - £ {S (t n ) + Vl eSi(t n ) + eS2(t n )\ , (6) 



where 



Si{t n ) 



So(t n ) — X\(t n ) 



X x {t n + a)- X x (t r 



a 



n 2 



S2 (t T 



[X x {t n + a)+ X x {t n -a)- 2X x (t n )f 



a:' 



2.1 The Precision Matrix 



In Eq. (j2J), the values of the process are coupled even at distant times through 
the precision matrix. In contrast, in Eq. (JH]) only values between neighboring 
times are coupled. The energy functional is then expressed as follows: 



H igc [x x -e]-lx x (t l )j x (t t ,t 3] e)x x (t J ), 



(7) 



where J x (ti,tj] 6) is the precision matrix. Based on Eq. ([6]), the precision ma- 
trix can be expressed in closed form as follows: 



1 I £ 2 £ 4 

J x (ti,tj;0) = Jofatj) +f] 1 — J 1 (t i ,t j ) + —J 2 (ti,tj) 



a' 



(8) 



where J (ti,tj) = S^j is the identity matrix, Ji(ti,tj) is the gradient precision 
sub-matrix, 
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and J2{ti,tj) is the curvature precision sub- matrix 
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In the discrete representation, given by Eqs. (!7|)-( IT0|) . the Spartan random 
process is equivalent to a Markov random process, the only difference from the 
latter being the nonlinear dependence on the coefficients. However, in contrast 
with Markov processes, the Spartan rand om process can be g eneralized to non- 
uniform sampling patterns as shown in (jElogne et al.l . 120081 ) . 



3 Parameter Inference 



Let T s = {t 1 ,...,t N } be a set of sampling times and X*(T S ) = {X£, ...,X^} 
the vector of sample measurements. Let L(0|X*) denote the likelihood of the 
parameter vector given the data. Let us also define the reduced parameter 
set 0' = (r/i,£, k c ) and the scaled precision matrix J^(0') = r] J x (0). Let the 
sample estimate of the energy functional be Hf gc [X\;0]. The scaled energy 
functional is defined by H' {gc [X\ \ 0'} = r] H igc [X\\ 0}. 



For a complete (with no missing data) series Hf gc [Xy, 0] is given from equa- 
tion ([7]) by replacing X\(ti) with the data X*(ti). 

In the case of series with missing data it is preferable to use the definition 

H igc [x x] 0} = ^- {s^tj + m fsjfi + , (ii) 

where S q are sample estimates of the respective quantities S q , q = 0,1,2, 
evaluated over all the time instants present in the series. The gaps do not 
influence the estimation of Sq, which is performed over all the sample points. 
In the case of Si, the average involves all compact clusters of data that include 
at least two nearest neighbors. Similarly, in the case of 5*2, the average involves 
all compact clusters of data that include at least three nearest neighbors. 



3. 1 Maximum Likelihood Estimation 



The maximum likelihood estimates (MLE) ar e obta i ned b y minimizing numer- 
ically the negative log-likelihood (NLL), e.g. (jSteinl . Il999l . pp. 169-175), which 
requires the evaluation and inversion of the covariance matrix. The numerical 
operations are computationally intensive, especially for large sample sizes. As 
we show below, the Spartan random process has a significant advantage in 
terms of computational speed. The NLL becomes: 



- log L(0|X*) = H ' isc [Xx; °' ] + £ logfa,) + f log(27r) - \ log 1 4(0') | . (12) 

The estimate fj follows from requiring the derivative of logL(#|X*) with re- 
spect to 7/0 to vanish, leading to, 

* = ^f^l, (13) 

and by replacing r/o with r/o in equation ( TT2l) . the NLL is given in terms of the 
scaled variables H'f gc [X\] 6'] and J'JJ)') as follows: 

- logL(0|X*) = ^ log (jUfV [X x ; 0'} /n) - i log ffid')\ + C N . (14) 



The constant Cn = Y[log(27r) + 1] is independent of 0' and can be dropped 
from the minimization. The NLL is minimized using a numerical optimiza- 
tion method. In the case of the Spartan random process, the computational 



efficiency is gained by the fact that H'f gc [X\; 0'] is estimated without the in- 
version of a full covariance matrix. Moreover, based on equations (Q and (I10p . 
J'-JO') is a pentadiagonal symmetric matrix, so that fast and accurate appro x- 
imations can be used for the evaluation of its determinant (IReuskenl . 120021 ) . 



3.2 Modified Method of Moments 



The modified method of moments (MMoM ) is based on fitting stochastic con- 
straints with their sample counterparts. The constraints are based on short- 
range correlations and are motivated from the terms S m (t n ), m = 0, 1,2, in 
the energy functional ([6]). The stochastic constraints are expressed as follows: 



E[S ) = G x (0), 



(15) 



EIS,} = — [G x (0) - G x (a)} 



E[S 2 



a' 



[3G X (0) + G x (2a) - 4G x (a)] . 



(16) 
(17) 



The stochastic constra ins are functions of obtained from spectral inte- 
grals ( IHristopulosl . 120031 ) . Herein we focus on time series with well defined tem- 
poral resolution and correlation times that exceed this resolution. Then, we can 
assume an infinite band limit (k c — > oo) and suppress the subscript A. The re- 
specti ve Spartan covariance function has been evaluated in lHristopulos and Elogne 
(120071 ). and it is given by the following: 
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for \r)i\ < 2, 
for rji = 2, 
for r)i > 2, 



where h = \r\/£, lj2 = (|2 ± r 7l |) 1 / 2 /2, u 1>2 = (\ Vl ± AI/2) 1 /^ and A = 
\rjl — A\ l l 2 . Hence, the stochastic constraints given by Eqs. (11511171) can be 
expressed analytically in terms of the covariance function. The optimal values 
of the mod el parameters are e stimated by minimizing the following objective 
functional (IHristopulosl . 120031 ) 
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(19) 



The application of this distance metric (DM) is based on the following as- 
sumptions: First, the sample averages S m , m — 0, 1, 2 are accurate and precise 



estimators of the stochastic expectations E'[S m ] of the underlying random pro 
cess (where E'[.] denotes the expectation with respect to the unknown proba- 



bility density). This assumes that ergodic conditions are satisfied, e.g. (lAdler 



198lh . Second, the stochastic expectations E[S m ] of the Gibbs random process 



should approximate the expectations E'[S m ] of the underlying process. 



4 Prediction of Missing Data (Temporal Interpolation) 



Let us assume that there are points where data are missing in the sampling 
set T s . The missing points are included in the prediction set, T p = z P }, 
which is assumed to be disjoint from T s . We denote by X(T P ) the vector 
of estimates (temporal predictions). Finally, let T = T s U T p and X(T) = 
X(T p ) o X*(T S ) be the joint vector of measurements and prediction points. 



Using the Kolmogorov- Wiener theory flKitanidisl . Il997t IWackernaeel l2003h . 
the single-point prediction at the point z p , X(z p ), is given as a linear superpo- 
sition of the data values. The coefficients of the superposition are selected so 
as to ensure zero bias and minimize the mean square error of the prediction, 
leading to the following Kolmogorov- Wiener prediction (KWP) equation: 



X(z n 



G-\0;T B ,T B )G(0;T a . 



tr 



x*(r s ; 



V 



p. 



(20) 



In the above, G(0;T s ,T s ) is the N x N data covariance matrix, G _1 is its 
inverse, G(0;T s , z p ) is the N x 1 covariance matrix between the estimated 
point and the data, and A tr denotes the transpose of the matrix A. 



The Spartan family of covariance functions, Eq. (fl8j) . can be used in KWP 
algorithms to provide new types of spatial dependence. Within the SRP frame- 
work it is also possible to define a new type of linear predictor, which allows 
multiple-point prediction to be performed simultaneously over all points in T p . 



4-1 The Spartan predictor 



The Spartan predictor (SP) defined below, is based on the FGC model. It re- 
lies on maximizing the conditional probability density, / X [X(T P )|X*(T S )]. Con- 
sidering the definition / x [X(T p )|X*(T s )] = / x [X(T p )]// x [X*(T s )], the problem 
reduces to maximizing / X [X(T)]. The latter involves the energy functional over 
the combined set of sampling and prediction points, i.e., 



H igc [X(T);0] = -X tr (T)J x (0)X(T). 



(21) 



Maximizing the conditional probability with respect to the prediction values 
leads to the following linear system of P equations 



dH fgc [X(T);0] 

dX(z p ) 



X(z p ) 



0, p=l,...,P. 



(22) 



The predictions do not depend on 770, since the latter is an overall scaling 
factor for J x (0)- If there are no interactions between the prediction points, i.e. 
J x (0; Z(, Zj) = 0, i, j = 1, P, the linear predictor is expressed explicitly by 



where V(z p ) is the interaction neighborhood of z p , i.e., the set of the points in 
ti G T s that interact with z p . In the case of the FGC functional the interaction 
neighborhood spreads up to the second-nearest neighbor. 

The Spartan predictor is linear and unbiased. Unlike the KWP, it does not 
require computationally expensive calculations of the covariance matrix. Spec- 
ifying an arbitrary search neighborhood for each prediction point is not nec- 
essary, since the SP uses only the data in the immediate interaction neigh- 
borhood. In KWP the estimation is performed sequentially point-by-point by 
solving a linear system of equations. In contrast, SP is a multi-point estima- 
tor and it can be formulated explicitly only if the interaction neighborhoods 
of the estimation points are disjoint. The numerical complexity of multipoint 
SP estimation is 0(P 3 ), while the numerical complexity of KWP is 0(P M 3 ), 
where M is average number of points inside the local search neighborhood of 
the estimation points. In some trivial cases (Fig. [1]), the equivalence between 
the FGC SP and KWP can be shown analytically, using symmetry of the pre- 
cision and covariance matrices of the regularly spaced data. For longer time 
series, the analytical comparison becomes cumbersome and hence, we resort to 
comparison by numerical calculations. In a general case, however, we cannot 
expect equivalence of the two methods, since SP only considers the data from 
the short-range interaction neighborhood while KWP considers all the data 
available (or those from the search neighborhood, generally different from the 
SP interaction neighborhood). Nevertheless, in the multipoint Spartan predic- 
tion, the local information propagates through the system via the interacting 
prediction points, .i.e, system of coupled equations ( 122|) . influencing predic- 
tions at distant locations and, hence, its performance compared to KWP is a 
priori not obvious. 
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Fig. 1. Some trivial time series, consisting of only 3 points, for which equivalence 
between SP and KWP can be shown analytically. The circles denote the known 
data and the crosses the prediction points. 



5 Case Study I: Simulated Data 



All the computations were performed in the Matlab® environment, on a desk- 
top computer with Pentium 4 CPU at 3 GHz and 1 GB of RAM. We evaluate 
the performance of the MLE and the MMoM methods for parameter inference. 
We also compare the performance of the SP and KWP missing data estimators. 
First, we use control time series of sizes N = 100, 200, 300, 500, 1000 based on 
the following covariance functions: (i) Gaussian Gx;G( r ) — cr 2 exp(— h 2 ), (ii) ex- 
ponential G x; e(t) = cr 2 exp(— h), (iii) spherical G , X ;s( r ) — °" 2 (1 ~ l-5h+ 0.5h 3 ) 
for h < 1 and Gx:s( r ) = for h > 1, and (iv) Whittle-Matern G x;M (t) = 
a 2 Y^j(nT) u K u (KT). In the above, a 2 represents the variance, b the correla- 
tion time, h = \r\/b, the normalized time lag, v the smoothness parameter, k 
the inverse length, and K v the modified Bessel function of index v. The sam- 
ples a re generated using the multivariate normal simulation method (jJohnsonl . 



20041 ) . The covariance parameters are: For the Gaussian (cr, b) = (10,3), for 



the exponential and the spherical (a, b) = (10, 5), and for the Whittle-Matern: 
(a,K,u) = (10,1,3.5). For each covariance model and size N, 100 time se- 
ries X* (z = 1,...,100) are generated in order to calculate optimization and 
computer time statistics. 



5. 1 Results: Parameter Inference 



The o ptimization employs the Nelder-Mead simplex search algorithm (IPress et al 



1992) for both the MLE and the MMoM cases. This algorithm is fast, because 
it does not require the computation of a Jacobian matrix. The optimization 
is terminated when both the model parameters and the objective function 
(NLL for the MLE or DM for the MMoM) change between consecutive steps 
less than the specified tolerance, e = 10~ 6 . The initial guesses for the Spartan 
parameters are ^ = a = 1, T]i = — 1 (for the MLE different initial guesses 
rji are in order due to multimodality). 

In Table [TJ we compare the MLE and MMoM parameter inference methods, 
using time series with Gaussian correlations. The mean values of the estimated 
Spartan parameters, (t/q), (r/l) and (£*) are listed, as well as their standard 
deviations S v *, S v * and Sp . The number of the optimization iterations (N it ), 
the optimization CPU time (T cpu ), and the cost function mean value (F*) 
(F* = NLL for MLE and F* = DM for MMoM) at termination are also 
tabulated. The large standard deviations of r] and rji for small N (larger 
for the MMoM), decrease with increasing N. The dependence on 771 can be 
attributed to the shape of the objective functions. 

Table 1 

Spartan parameter inference by MLE and MMoM (in the table abbreviated as ML 
and MM, respectively) on 100 samples for each domain size N with the Gaussian 
covariance dependence (similar results are obtained with the other models). The 
calculated statistics: the mean values of the estimated parameters (t]q), and 
(£*), their standard deviations S v *, S v * and Sp, the number of the optimization 
iterations (Nit), the optimization CPU time (T cpu ), and the cost function mean 
value {F*) at termination. 
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(a) MMoM 
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(b) MLE 

Fig. 2. Top view of (a) the distance metric and (b) the negative log-likelihood func- 
tion of one realization of the time series with the Gaussian covariance dependence 
and the parameters values (cr,b) = (10,3) for N = 100, projected onto the (771, £) 
plane. The yellow circles represent the locations of optimal values of (jji,^*), ob- 
tained from 100 different realizations, calculated by (a) MMoM and (b) MLE. The 
green circles in (b) represent solutions stuck in local minima. 



In Fig. 2(a), the DM function of one realization (N = 100) is projected onto 
the parameter space (771, £)■ The optimal values r/*,^*, obtained from the 100 
realizations are also marked on this plane. The DM has a single minimum 
in the parameter plane. The estimates of i]l tend to spread away from the 
permissibility boundary at r)i = —2, producing a skewed distribution with 



relatively large variance. A similar plot of the NLL function, Fig. 2(b) , exhibits 
two local minima. The optimal values 77*, 



the initial guess r)\ 



(o) 



shown in green are obtained using 
while those in yellow (corresponding to the global 



minimum) are obtained with rj\ 
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Fig. 3. Plots of the true correlation function with the Gaussian dependence and the 
true parameters a = 10, 6 = 3 (circles), and the Spartan ones using the parameter 
sets estimated by MMoM (diamonds) and MLE (squares). 

For interpolation purposes, the differences in the parameter sets obtained by 
means of the two methods are practically negligible. In Fig. [3j the Spartan 
correlation functions obtained with the two sets of parameters are compared 
with the true (Gaussian) correlation. Near the origin and at intermediate 
distances, which are crucial for the interpolation, the correlation functions 
almost coincide. The negative values of 77* causes a negative hole in the Spartan 
correlation at larger distances. This hole is only observed in the Gaussian case 
due to the short range of the constraints. 

Considering the optimization statistics, the value of the DM function (~ 10~ 21 ) 
means an excellent match between the sample and stochastic constraints. The 
MMoM needs more iterations than the MLE to converge. However, the MMoM 
CPU time is much lower and almost insensitive to the domain size, while 
the MLE CPU time grows rapidly with N. Since the number of iterations is 
insensitive to N in both MLE and MMoM, the CPU time per iteration shows 
similar behavior with increasing N as the total CPU time. For example, for 



N = 1000 the MMoM is 3 965 times faster than the MLE. In the case of 
the Gaussian model reasonable CPU times are attained for the domain sizes 
considered. However, the optimization slows down considerably for the other 
covariance models. This is due to the presence of a long flat valley in the DM 
and the NLL functions, which extends to large 771 ~ 10 5 . Convergence requires 
a high number of iterations, O(10 3 ), which, for larger N, is easily manageable 
by MMoM but too expensive computationally in the case of MLE. Hence, in 
the following we will use MMoM for parameter inference. 



5.2 Results: Interpolation 



In the following we investigate the problem of interpolation (missing data 
prediction). Let us denote by p the degree of thinning, i.e., the percentage 
of the missing data. We select one time series with N = 1 000 per covariance 
model and we generate 100 different partitions into a validation set, containing 
660 points at random, and a training set, containing 340 points (i.e., p ~ 0.66.) 
For comparison purposes, in Fig.Hl we show the rjl, £* estimates for p = (full 
data) and the distribution of the r/l, £* values for p = 0.66, calculated by both 
the MLE and the MMoM. 



MLE - 340 points 
o MMoM - 340 points 

• MMoM - 1000 points 

★ MLE- 1000 points 




1.6 1.7 1.8 1.9 



Fig. 4. Optimal rjl,^* estimates from complete series of 1 000 points calculated by 
MLE (rilled star) and MMoM (filled circle). Also, distribution of the optimal nl,£,* 
estimates based on the training set of 340 points calculated by MLE (empty stars) 
and MMoM (empty circles). 



We use both KWP (with search neighborhood extending over the entire se- 
ries) and the Spartan predictor. The covariance function is calculated using 



the asymptotic formulas given by Eqs. (|T8|) . For the analysis of prediction 
performance, the validation points are segregated into 9 different categories, 
based on the number of nearest and next-nearest neighbors that belong in 
the training set. For example, the category i,j = 0, 1,2, includes those 

validation points for which i nearest neighbors and j second-nearest neighbors 
are in the training set. 

The following statistics are calculated, for each category separately, and over 
all the validation points: mean absolute error (MAE), mean relative error 
(MRE), mean absolute relative error (MARE), and root mean square error 
(RMSE). The calculations are performed for the four covariance models and 
results for the Gaussian and the exponential cases are shown in Tables [2] and 
El Overall, validation points with more data in their interaction neighborhood 
display smaller errors than those with fewer data. In the Gaussian and the 
Whittle-Matern cases the KWP performs slightly better than the SP, while 
in the exponential and the spherical cases there is no significant difference 
between the two methods. 

Table 2 

SP and KWP performance comparison on synthetic data with Gaussian covariance 
and parameters: mean m = 50, standard deviation a = 10 and correlation length 
6 = 3. The Spartan parameters were inferred from a training set of 340 points. The 
errors were calculated from the validation set of 660 points. 







(2,2) 


(1,2) 


(0,2) 


(0,1) 


(0,0) 


(1,1) 


(1,0) 


(2,0) 


(2,1) 


Total 


MAE 


SP 


0.40 


1.17 


3.30 


5.36 


7.98 


1.89 


2.93 


0.90 


0.64 


3.83 




KWP 


0.27 


0.96 


3.17 


5.15 


7.85 


1.72 


2.86 


0.86 


0.52 


3.69 


MARE [%} 


SP 


0.8 


2.5 


7.0 


11.5 


17.4 


4.0 


6.2 


1.9 


1.3 


8.2 




KWP 


0.6 


2.0 


6.7 


10.9 


17.1 


3.6 


6.0 


1.8 


1.1 


7.9 


MRE [%) 


SP 


-0.1 


-0.2 


-1.1 


-1.7 


-4.0 


-0.4 


-0.5 


-0.2 


-0.1 


-1.3 




KWP 


0.0 


-0.1 


-0.9 


-1.5 


-3.9 


-0.3 


-0.5 


-0.1 


-0.1 


-1.3 


RMSE 


SP 


0.49 


1.46 


4.13 


7.25 


10.28 


2.42 


3.82 


1.15 


0.84 


6.03 




KWP 


0.32 


1.21 


3.98 


6.43 


9.62 


2.16 


3.61 


1.10 


0.66 


5.51 



6 Case Study II: Aerosol concentration data 

Next, we consider an application of the Spartan predictor to a time series con- 
sisting of concentration measurements of atmospheric aerosol PM2.5 particles 
(i.e., particles of aerodynamic diameter smaller than 2.5/im). These measure- 
ments were sampled on the grounds of the Technical University of Crete using 
the aerosol monitor DustTrak, over a period of almost 10 days in June 2006. 



Table 3 

SP and KWP performance comparison on synthetic data with exponential covari- 
ance and parameters: mean m = 50, standard deviation a = 10 and correlation 
length 6=5. The Spartan parameters were inferred from a training set of 340 
points. The errors were calculated from the validation set of 660 points. 







(2,2) (1,2) (0,2) (0,1) 


(0,0) 


(1A) 


(1,0) 


(2,0) 


(2,1) 


Total 


MAE 


SP 


3.81 


4.08 


4.87 


5.72 


6.76 


4.33 


4.49 


3.60 


3.66 


5.03 




KWP 


3.84 


4.11 


4.87 


5.71 


6.75 


4.36 


4.49 


3.61 


3.68 


5.04 


MARE [%] 


SP 


8.0 


8.6 


10.4 


12.3 


14.6 


9.2 


9.6 


7.6 


7.8 


10.8 




KWP 


8.0 


8.7 


10.4 


12.3 


14.6 


9.3 


9.6 


7.6 


7.8 


10.8 


MRE [%] 


SP 


-0.3 


-1.1 


-1.7 


-2.4 


-3.0 


-1.3 


-1.6 


-0.8 


-1.0 


-1.8 




KWP 


-0.2 


-1.1 


-1.7 


-2.3 


-3.0 


-1.2 


-1.5 


-0.7 


-1.0 


-1.8 


RMSE 


SP 


4.58 


5.11 


6.08 


7.18 


8.46 


5.38 


5.59 


4.46 


4.55 


6.44 




KWP 


4.62 


5.15 


6.08 


7.17 


8.46 


5.40 


5.59 


4.46 


4.58 


6.44 



The original time series consists of 2 831 mean concentration values, mea- 
sured at 5 minute intervals. The correlations in the observed series span over 
large distances (measured in the lag of a = 5 minutes). In order to make 
the interpolation task more challenging, we reduced the length of the series 
by non-overlapping clustering the original data using averages over a = 40 
minute intervals. This resulted in a coarse-grained time series of 353 data 
points that will be used below. The data fail the Kolmogorov-Smirnov nor- 
mality test at the 5% level. Nonetheless, in the following we will model the 
series with a second-order stationary FGC Spartan random process. The mo- 
tivation for this approach is that reasonable interpolation performance can be 
obtained by matching the short-range correlations. 

Table 4 

Spartan model parameters t]q, rj* and £*, number of optimization iterations, Na, the 
optimization CPU time, T cpu , and the final value F* of the objective function (NLL 
for MLE and DM for MMoM), calculated from the complete aerosol concentration 
time series by MLE and MMoM. 



* 


vl 


r 


N it T cpu [s] 


p* 


MLE 5 213.90 


15.54 


4.17 


76.00 16.03 


585.89 


MMoM 7 721.90 


32.51 


3.53 


182.00 0.39 


2E-29 



The Spartan parameters based on the complete time series are shown in Ta- 
ble H]). The number of iterations involved in the MMoM is more than twice as 
high as that for the MLE, but the CPU time is 41 times faster. The final DM 
value of F* = 2 1CU 29 indicates excellent matching between the sample and 
stochastic constraints. In Fig. [5j we compare the correlation functions based 
on the estimated Spartan parameters with the experimental correlation. Very 
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Fig. 5. Plots of the empirical correlation function of the aerosol concentration data 
(circles), and the Spartan ones using the parameters estimated by MMoM (dia- 
monds) and MLE (squares). The lag is measured in units of the sampling step 
a = 40 min. 

good agreement of the experimental and estimated functions at short time 
ranges is observed. 

Table 5 

SP and KWP performance comparison on the aerosol concentration data. The Spar- 
tan parameters were inferred by MLE, using the training set of 121 points and the 
errors were calculated on the validation set of the remaining 232 points. Category 
includes points with i nearest and j next-nearest neighbors from the training 
set. 







(2,2) (1,2) (0,2) (0,1) 


(0,0) (1,1) (1,0) (2,0) (2,1) 


MAE 


SP 


1.75 


2.67 


3.85 


5.30 


8.15 


2.88 


3.24 


2.05 2.03 




KWP 


1.80 


2.68 


3.87 


5.32 


8.10 


2.90 


3.25 


2.06 2.06 


MARE [%] 


SP 


4.3 


6.6 


9.3 


12.5 


19.1 


6.8 


7.5 


4.5 4.9 




KWP 


4.4 


6.7 


9.4 


12.6 


19.0 


6.9 


7.6 


4.5 5.0 


MRE [%} 


SP 


-0.4 


-1.1 


-1.4 


-2.6 


-5.2 


-0.9 


-0.9 


-0.2 -0.8 




KWP 


-0.4 


-0.9 


-1.3 


-2.5 


-5.0 


-0.8 


-0.8 


-0.2 -0.6 


RMSE 


SP 


2.15 


3.72 


5.14 


7.44 


11.04 


4.23 


4.65 


2.92 2.93 




KWP 


2.19 


3.73 


5.12 


7.46 


10.97 


4.25 


4.65 


2.92 2.98 



For the interpolation, the data are partitioned into a training set that involves 
121 randomly selected times, and a validation set including the remaining 
232 values. The parameter inference on the training set is performed by both 



MMoM and MLE. There is no multimodality in either NLL or DM surface. 
However, as shown in Fig. [6j there is considerable scatter in the parameter 
estimates obtained from different training set realizations. 

We calculate the same interpolation error measures as for the synthetic data. 
The results are shown in Table [5], using MLE estimates of the Spartan parame- 
ters. The differences between the SP and the KWP are minimal. The estimates 
obtained using the MLE based Spartan parameters are slightly better than the 
ones based on the MMoM. For comparison, the total mean absolute relative 
error of the MLE-based estimates is only 0.8% (using the Spartan predictor) 
and 0.9% (using the KWP) smaller than the MMoM one. 

In Fig. [7J we illustrate the missing data reconstruction based on the Spartan 
predictor. As the data in Table E] already suggest, the largest deviations of the 
estimates from the actual data are seen in the cases of "isolated" points with 
no training data in their interaction neighborhood, i.e., in the category (0,0). 
The contribution to the total errors from such points can be considerable if the 
missing data represent a large portion of the entire series. In Table |6l the total 
errors are shown versus the percentage of the missing data. The mean values of 
the correlation coefficient (R) between the estimated and actual values at the 
missing points is also included. High correlation values (~ 95%) are observed 
even for p = 0.66. 

Table 6 

SP and KWP performance comparison based on the aerosol concentration time 
series for different sizes of missing data. The Spartan parameters are inferred by 
MMoM using training sets of (1 — p) *N points. The errors are calculated based on 
the remaining p * N points, for N = 353 and p = 0.66, 0.6, 0.4, 0.2. 





P 


0.66 


0.60 


0.40 


0.20 


MAE 


SP 


4.66 


3.86 


2.81 


2.05 




KWP 


4.69 


3.85 


2.81 


2.00 


MARE [%] 


SP 


11.1 


9.1 


6.6 


5.6 




KWP 


11.2 


9.1 


6.6 


5.4 


MRE [%) 


SP 


-2.6 


-1.7 


-0.7 


-1.2 




KWP 


-2.5 


-1.6 


-0.8 


-1.0 


RMSE 


SP 


7.18 


6.00 


4.34 


3.07 




KWP 


7.25 


5.98 


4.34 


2.97 


R 


SP 


0.95 


0.97 


0.99 


0.99 




KWP 


0.95 


0.97 


0.99 


0.99 




0.25 



0.2 



0.15 




0.05 



(a) MMoM 
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Fig. 6. Top view of (a) the distance metric and (b) the negative log-likelihood func- 
tion of the complete aerosol concentration time series, projected onto the (?7i,£) 
plane. The yellow circles represent the locations of optimal values of (?7*,£*), ob- 
tained from the complete data and the yellow stars those obtained from 100 different 
realizations of the training set of 121 points, calculated by (a) MMoM and (b) MLE 
(a few points are out of the scale range). 
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Fig. 7. Aerosol concentration time series reconstruction for one training set real- 
ization of 121 points. The data in training set are marked by empty circles. The 
data in the validation set are marked by crosses. The estimates at the validation 
set locations are marked by filled circles. The time unit is a = 40 minutes. 



7 Conclusions 



We present a framework for the analysis of Gaussian, stationary time series 
based on Spartan random processes. In this framework, the temporal depen- 
dence is determined from 'pseudo-energy' functionals. The modified method 
of moments (MMoM) is proposed for Spartan parameter estimation. Its main 
advantages are low computational complexity (high speed) even for large sam- 
ple sizes. Temporal interpolation is formulated by maximizing the conditional 
probability density function of the Spartan process, based on the available 
data. The method is tested with synthetic data and with a time series of at- 
mospheric PM2.5 aerosol concentration. The Spartan interpolator is shown to 
perform similarly to the standard Kolmogorov- Wiener predictor at reduced 
computational cost. 



In time series modeling, it is often necessary to treat the model parame- 
ters as time- dependent and to estimate them continuously in the "moving 
window" fashion. This approach accounts for lack of stationarity, which is 
a common feature in meteorological data. The "moving window" approach 
has been recently applied to the automatic mapping of rainfall data, using 
maximum l ikelihood for paramete r infer ence and universal kriging as the in- 
terpolator ( iPardo-Iguzquiza et all 120051 ) . Windows are typically overlapping 
and of varying size. In each window, parameter inference and model selec- 
tion (based on the Akaike Information Criterion) are conducted, followed by 



interpolation. Spartan random processes have definite advantages for applica- 
tion in the moving window framework. First, model selection can be bypassed 
thanks to the flexibility of the SRP (compared with two or three parameter 
covariance models). Second, the computational efficiency of Spartan parame- 
ter inference and interpolation would allow for iterative methods of window 
size adjustment that will properly account for the physical conditions at the 
local scale. 
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List of figures 

Fig 1: Some trivial time series, consisting of only 3 points, for which equiva- 
lence between SP and KWP can be shown analytically. The circles denote the 
known data and the crosses the prediction points. 

Fig 2: Top view of (a) the distance metric and (b) the negative log-likelihood 
function of one realization of the time series with the Gaussian covariance 
dependence and the parameters values (a, b) = (10, 3) for N = 100, projected 
onto the (?7i,£) plane. The yellow circles represent the locations of optimal 
values of (r)l,£*), obtained from 100 different realizations, calculated by (a) 
MMoM and (b) MLE. The green circles in (b) represent solutions stuck in 
local minima. 

Fig 3: Plots of the true correlation function with the Gaussian dependence 
and the true parameters a = 10, b = 3 (circles), and the Spartan ones using 
the parameter sets estimated by MMoM (diamonds) and MLE (squares). 

Fig 4: Optimal rjl, estimates from complete series of 1 000 points calculated 
by MLE (filled star) and MMoM (filled circle). Also, distribution of the optimal 
estimates based on the training set of 340 points calculated by MLE 
(empty stars) and MMoM (empty circles). 

Fig 5: Plots of the empirical correlation function of the aerosol concentration 
data (circles), and the Spartan ones using the parameters estimated by MMoM 
(diamonds) and MLE (squares). The lag is measured in units of the sampling 
step a = 40 min. 

Fig 6: Top view of (a) the distance metric and (b) the negative log-likelihood 
function of the complete aerosol concentration time series, projected onto the 
(?7i,£) plane. The yellow circles represent the locations of optimal values of 
(t7i,£*), obtained from the complete data and the yellow stars those obtained 
from 100 different realizations of the training set of 121 points, calculated by 
(a) MMoM and (b) MLE (a few points are out of the scale range). 

Fig 7: Aerosol concentration time series reconstruction for one training set 
realization of 121 points. The data in training set are marked by empty circles. 
The data in the validation set are marked by crosses. The estimates at the 
validation set locations are marked by filled circles. The time unit is a = 40 
minutes. 



